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I. INTRODUCTION 

We perceive by constructing three-dimensional (3D) 
models from random sightings of objects in different ori- 
entations. Our ability to recognize that we are seeing a 
profile even when the face is unknown pQ suggests that 
perception may rely on object-independent properties of 
the image formation process. The discovery of these 
properties would underpin our mathematical formalisms, 
enhance our computational reach, and perhaps elucidate 
the process of perception. 

Here, we show that image formation by scattering pos- 
sesses specific symmetries, which stem from the nature 
of operations in three-dimensional (3D) space. This fol- 
lows from a theoretical framework, which considers the 
information contained in a collection of random sight- 
ings of an object. An example is a collection of two- 
dimensional (2D) snapshots of a moving head, or a ro- 
tating molecule. Our primary results can be summarized 
as follows. (1) The information gleaned from random 
sightings of an object by scattering onto a 2D detector 
can be represented as a Ricmannian manifold with prop- 
erties resembling well-known systems in general relativ- 
ity and quantum mechanics. As shown below and in a 
subsequent paper, this allows one to efficiently construct 
a 3D model of the object. (2) The information about 
an object can be decomposed into an object-independent 
term reminiscent of a Platonic Form or "ideal object" 
[2], plus an object-specific fingerprint. (3) The object- 
independent term can be used to determine the object 



orientation giving rise to each snapshot independently of 
the object, while the object-specific fingerprint can be 
used for recognition purposes. This seems to have been 
experimentally observed in face recognition in higher pri- 
mates IT]. 



More generally, it is now well-established that numer- 
ical data clouds give rise to manifolds, whose properties 
can be accessed by powerful graph-theoretic methods (3j- 
8 . However, it has proved difficult to assign physical 
meaning to the results [TU] . Using the arsenal of tools 
developed in differential geometry, general relativity, and 
quantum mechanics, we elucidate the physical meaning 
of the outcome of graph-theoretic analysis of scattering 
data, without the need for restrictive a priori assumptions 
Finally, perception can be formulated as learning 
some functions on the observation manifold. Our ap- 
proach is then tantamount to machine learning with a 
dictionary acquired from the empirically accessible eigen- 
functions of well-known operators. 



After a brief conceptual outline in Sec. [3TJ we summa- 



rize relevant previous work in Sec. Ill The symmetry- 
based scheme for analyzing scattering data is developed 
in Sec. |IV( and the utility of these concepts demonstrated 
in Sec. |V| in the context of simulated images from X-ray 
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scattering. We present our conclusions in Sec. VI Mate- 
rial of a more technical nature, including mathematical 
derivations and algorithms, is provided in Appendices [X] 
and [5] A movie of a reconstructed object in 3D is pre- 
sented as supplementary online material [12 . Further 
applications are described in a subsequent paper [13j . 
hereafter referred to as Paper II. 
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II. CONCEPTUAL OUTLINE 

We are concerned with constructing a model from 
sightings of a system viewed in some projection, i.e., by 
accessing a subset of the variables describing the state of 
the system. A 3D model of an object, for example, can 
be constructed from an ensemble of 2D snapshots. Each 
snapshot can be represented by a vector with the pixel in- 
tensities as components. A collection of snapshots then 
forms a cloud of points in some high-dimensional data 
space (Fig. [IJ. In fact, the cloud defines a hypersurface 
(manifold) embedded in that space, with its dimensional- 
ity determined by the number of degrees of freedom avail- 
able to the system. Snapshots from a rotating molecule, 
for example, form a 3D hypersurface. 

This perspective naturally leads one to use the tools of 
differential geometry for data analysis, with the metric 
playing a particularly important role. In non-technical 
terms, one would like to relate infinitesimal changes in a 
given snapshot to the corresponding infinitesimal changes 
in orientation, giving rise to the changes in the snapshot. 
In other words, one would like to relate the metric of 
the data manifold produced by the collection of snap- 
shots to the metric of the manifold of rotations. This 
would allow one to determine the rotation operation con- 
necting any pair of snapshots. The problem, however, is 
that the metric of scattering manifolds is not simply re- 
lated to that of the rotation operations. We solve this 
problem in two steps. First, we show that the metric 
of the data manifold can be decomposed into two parts, 
one with high symmetry, and an object-specific "resid- 
ual" with low symmetry. Second, using results from gen- 
eral relativity and quantum mechanics, we show that the 
eigenfunctions of the high-symmetry part are directly re- 
lated to those of the manifold of rotations. This allows 
one to deduce the snapshot orientations from the high- 
symmetry part, which is the same for all objects, and use 
the object-specific part as a fingerprint of each object. 

We conclude this section with four observations. First, 
real datasets necessarily contain a finite number of ob- 
servations. As such, they must be treated by graph- 
theoretic means, which tend to the differential geomet- 
ric limit under appropriate conditions. This issue is ad- 
dressed below, as needed. Second, we have not distin- 
guished between image and diffraction snapshots. If the 
dataset consists of the latter, each snapshot, or a recon- 
structed 3D diffraction volume must be inverted by so- 
called phasing algorithms [T4WT6] , As this procedure is 
well established, we do not address it further. Third, 
the discussion is restricted to the effect of operations on 
objects with no symmetry. The case where the object 
itself has specific symmetries will be treated elsewhere. 
Finally, the knowledge gained in the course of our analy- 
sis is sufficient to navigate from any starting point to any 
desired destination on the manifold; i.e., given any snap- 
shot, produce any other as required. This is, of course, 
tantamount to possessing a 3D model of the object. As 
this approach is somewhat unfamiliar, we provide actual 



Snapshot 




FIG. 1. Intensity values of a snapshot on a detector with 
n pixels, represented as an n-dimensional vector. Changes in 
the object or its position alter the snapshot, causing the vector 
to trace out a manifold in the n-dimensional data space. The 
dimensionality of manifold is determined by the number of 
degrees of freedom available to the object. For instance, the 
rotations of a rigid object give rise to a 3D manifold. 



3D models to demonstrate the power of our approach. 



III. PREVIOUS WORK 

In order to place our contribution in context, we 
present a brief summary of previous work. As a review 
is beyond the scope this paper, the summary is necessar- 
ily brief, leaving out much excellent work in this general 
area. 

When the snapshots emanate from known object orien- 
tations and the signal is adequate, standard tomographic 
approaches |I7j are routinely employed. 3D models can 
also be constructed when the snapshot orientations are 
unknown and the signal is low [18] [signal-to-noise ra- 
tio (SNR) 5 dB], or extremely low [TUT] (~ 1(T 2 

scattered photons per Shannon pixel with Poisson noise 
and background scattering). Finally, efforts are under- 
way to recover structure, map conformations, and deter- 
mine dynamics ("3D movies") from random sightings of 
non-identical members of heterogeneous ensembles |21| - 
[2"3"| and/or evolving systems [2"T1 123] . 

Data-analytical approaches now include powerful 
graph-theoretic and/or differential geometric means to 
deduce information from the global structure of the 
data representing the (nonlinear) correlations in the 
dataset [5HS1 HH US]. Ideally, this structure takes 
the form of a low-dimensional manifold in some high- 
dimensional space dictated by the measurement appara- 
tus (see Fig. [I]). These so-called manifold-embedding ap- 
proaches are in essence nonlinear (kernel) principal com- 
ponents techniques [26] . which seek to display in some 
low-dimensional Euclidean space the manifold represent- 
ing the correlations in the data. While powerful, such 
approaches face three challenges: computational cost; ro- 
bustness against noise; and the assignment of physical 
meaning to the outcome of the analysis, i.e., physically 
correct interpretation. 

Bayesian manifold approaches |19J [27] and their equiv- 
alents [iniHE] are able to operate at extremely low signal, 
but require prior knowledge of the manifold dimensional- 
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ity and its physical meaning. They also display extremely 
unfavorable scaling behaviors [T9H2"Tj [28] . It has thus not 
been possible, for example, to reconstruct objects with 
diameters exceeding eight times the spatial resolution, 
severely limiting the size of amenable objects and/or the 
resolution of the reconstruction. 

Non-Bayesian graph-theoretic methods are computa- 
tionally efficient, but tend not to be robust against noise 
[53 [50]. More fundamentally it has proved difficult to as- 
sign physical meaning to the outcome of graph-theoretic 
analyses [5], in the sense that the meaning of the di- 
mensions in which the data manifold is embedded is un- 
known. Strategies to overcome this problem have in- 
cluded exploring the graph-theoretic consequences of spe- 
cific changes in model systems [10] , making restrictive as- 
sumptions about the nature of the data, and/or extract- 
ing specific information from the data first and subjecting 
this information to graph-theoretic analysis [9] [11] . 

In this paper, we present a computationally efficient 
theoretical framework capable of interpreting the out- 
come of graph-theoretic analysis of scattering data with- 
out restrictive assumptions. Using this approach, we 
demonstrate 3D structure recovery from 2D diffraction 
snapshots of unknown orientation at computational com- 
plexities four orders of magnitude higher than hitherto 
possible [Tni[2n]- In Paper II, we show that this frame- 
work can be used to: (1) recover structure from simu- 
lated and experimental snapshots at signal levels ~ 10 x 
lower than currently in use; and (2) reconstruct time- 
series (movies) from ultralow-signal random sightings of 
evolving objects. In sum, therefore, our approach offers a 
powerful route to recovering structure and dynamics (3D 
movies) from ultralow-signal snapshots with no orienta- 
tional or timing information. 



IV. SYMMETRY-BASED ANALYSIS OF 
SCATTERING DATA 

In this section we develop a mathematical framework 
for analyzing ensembles of 2D snapshots, using far-field 
scattering by a single object as a model problem to focus 
the discussion (see Sec. IV A). The basic principle of our 
approach, laid out in Sec. |IVB] is that the data acquisi- 
tion process can be described as a manifold embedding 
& [3UE2], mapping the set of orientations of the object, 
SO(3), to the Hilbert space of snapshots on the detector 
plane. As a result, the differential-geometric properties 
of the rotation group formally carry over to the scattering 
dataset. In particular, the dataset can be equipped with 
a homogeneous Riemannian metric B, whose Laplacian 
eigenfunctions are the well-known Wigner D-functions 
[551455] . Here, a metric is called homogeneous if any 
two points on the data manifold can be connected via 
a transformation that leaves the metric invariant. We 
refer to this class of transformations as symmetries. In 
Sec. |IV C[ we show that taking advantage of symmetry, 
as manifested in the properties of homogeneous metrics 



on SO(3) [55], leads to a powerful means for recovering 
snapshot orientations and hence the 3D structure of ob- 
jects. 

A number of sparse algorithms [5J [5] are able to com- 
pute discrete approximations of Laplacian eigenfunctions 
directly from the data. However, these algorithms do 
not provide the eigenfunctions associated with B, but 
rather the eigenfunctions of an induced metric g associ- 
ated with the embedding <P (i.e., the measurement pro- 
cess). The properties of this induced metric are discussed 
in Sec. |IVD| There, we show that g is not homogeneous, 
but admits a decomposition into a homogeneous metric 
plus a residual. Intriguingly, the homogeneous part of g 
corresponds to a well-known solution of general relativity 
(the so-called Taub solution [57]), which has the impor- 
tant property of preserving the Wigner D-functions as 
solutions of the Laplacian eigenvalue problem [55], As 
described in Sec. IV E| (and demonstrated in Sec. [V] and 
Paper II), this property applies to a broad range of scat- 
tering modalities, and can be exploited to perform highly 
accurate 3D reconstruction in a computationally-efficient 
manner. 



A. Image formation 

We first treat image formation as elastic, kinematic 
scattering of unpolarized radiation onto a far-field de- 
tector in reciprocal space [55], where each incident pho- 
ton can scatter from the object at most once (kine- 
matic scattering), and energy is conserved (elastic scat- 
tering). We show later that our conclusions are more 
generally applicable. In this minimal model, illustrated 
in Fig. [2] an incident beam of radiation with wavevec- 
tor gi = Qz (we set Q > by convention) is scattered 
by an object of density p{u) with Patterson function 
P(u) = J du' p(u')p(u — u') = P(—u), where u and 
u' are position vectors in R 3 . The structure-factor am- 
plitude at point r on a detector plane fixed at right angles 
to the incident beam is given by the usual integral, 



a(r) 



oj(r) J du P(u) exp(iu • q(r)) 



1/2 



(1) 



where w(r) is an obliquity factor proportional to the solid 
angle subtended at u = by the detector element at r, 
and q(r) is the change in wavevector due to scattering. 
In elastic scattering we have 



q(r) = q 2 (r) - £?i, 
q2{r) — Q(sin0cos(/>, sin0sin</>, cos< 



(2) 



where q 2 (r) is the scattered wavevector, and 6 and <f> are 
the polar and azimuthal angles in reciprocal space corre- 
sponding to position r on the detector plane (see Fig. |2| . 
Note that, by the convolution theorem, the Fourier trans- 
form T(P) of the Patterson function is equal to the non- 
negative function ^(p)! 2 . As a result, no modulus sign 



4 




FIG. 2. Geometry of image formation by scattering, qi and 
q2 represent the wavevectors for the incident and scattered 
radiation, respectively, (x, y, z) the lab frame, and (r, (f>) co- 
ordinates in the detector plane. 



is needed in Eq. Q, and a 2 (r 3 ) is equal to the intensity 
Ij at pixel j in Fig. [T] 

In an idealized, noise- free experiment involving a single 
object conformation, one observes a sequence of s snap- 
shots on a detector of infinite extent, with the snapshots 
arising from random orientations of the object. Each 
snapshot is obtained from Eq. ([I]) by replacing P(u) with 
Pr(u) — P(R~ 1 u), where R is a 3 x 3 right-handed rota- 
tion matrix; i.e., R T R = I and det(R) = 1. 

The set of all matrices satisfying these conditions form 
the 3D rotation group, SO(3). It is well known that 
SO(3) is a Lie group, i.e., it is a diffcrentiable manifold 
(in this case of dimension 3) [3TJ [35J [3J5] • Among the sev- 
eral parameterizations (coordinate charts) of SO (3), of 
interest to us here will be Euler angles, unit quaternions, 
and hyperspherical coordinates [ID] . The last coordinate 
system stems from the fact that SO (3) has the topology 
of a three-sphere with its antipodal points identified. 

For the remainder of this section, SO (3) will play the 
role of the latent manifold S, i.e., the set of degrees of 
freedom available to the object. In more general ap- 
plications, the latent manifold would be augmented to 
contain the additional degrees of freedom, provided, of 
course, that these degrees of freedom admit a manifold 
description — a natural requirement for operations such 
as shifts and smooth conformational changes. 



B. Riemannian formulation of image formation 

We are concerned with intensity patterns generated by 
scattering, i.e., a subset of all possible patterns on a pla- 
nar detector, referred to as data space. It is reasonable to 
expect that in a physical experiment involving a finite- 
power beam, the resulting distributions of structure- 
factor moduli belong to the set of square-integrable func- 
tions on the detector plane, L 2 (M. 2 ). This is a Hilbert 
space of scalar functions fi equipped with the usual in- 



ner product 

(A,/ 2 ) = J drf^Mr), (3) 
and corresponding norm 

Wfi\\ = (fiJi) 1/2 . (4) 

In many respects, L 2 (M. 2 ) can be though of as a gener- 
alization of Euclidean space to infinite dimensions. In 
particular, the L 2 norm induces a distance ||/i — /2I 
analogous to the standard distance in finite-dimensional 
Euclidean space. Moreover, viewed as a manifold [41j . 
L 2 (M. 2 ) has the important property that its elements are 
in one-to-one correspondence with its tangent spaces. 
Thus, the inner product in Eq. ([3| can be interpreted 
as a metric tensor acting on pairs of tangent vectors on 
L 2 (M. 2 ), or manifolds embedded in L 2 (R 2 ). 

We describe the image formation process as an embed- 
ding |3ll [32], : S i-> L 2 (R 2 ), taking the latent manifold 
into data space. For instance, in the present application 
with S — SO(3), we have the explicit formula ^(R) = <ir, 
where R is an SO (3) rotation matrix and or is the pat- 
tern on the detector given by Eq. with P(u) replaced 
by P(R~ 1 m). 

The image M. = 'P(S) of the latent manifold in data 
space is called the data manifold (see Fig. |Tj) . In the ab- 
sence of degeneracies such as object symmetry, the data 
and latent manifolds are diffeomorphic manifolds, i.e., 
completely equivalent from the point of view of differ- 
ential geometry. In contrast with manifolds of shapes 
[24] 142] which can contain singularities, these are mani- 
folds of operations and thus generally well-behaved. The 
image of the latent manifold in data space is then a 
smooth (here, three-dimensional) embedded hypersur- 
face, which preserves the topology of the latent mani- 
fold, and the structure of its tangent spaces [53]. Per- 
ception, at least in this simple model, can be viewed as 
understanding the map between the latent manifold of 
orientations and the data manifold of pixel intensities. 

The inner product induced on the data manifold by 
the inner product in data space leads to a metric tensor 
g on the latent manifold, which encodes the properties 
of the object and the imaging process. This metric is is 
constructed by converting the inner product of L 2 (IR 2 ) in 
Eq. ([3| to an equivalent inner product between tangent 
vectors on the data manifold. Specifically, given tangent 
vectors V\ and on 5, the induced metric g is defined 
through the action 

S(i*,t*) = (*>i),**(«2)), ( 5 ) 

where <!>* is the so-called derivative map associated with 
[4TJ[44], carrying along tangent vectors on S to tangent 
vectors on M. . The induced metric can be expanded in a 
suitable tensorial basis for S as a 3x 3 symmetric positive- 
definite (SPD) matrix with components g^ Vl viz., 

3 

.9 = E 9^E\ (6) 
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where {E 1 , E 2 , E 3 } is a basis of dual vector fields on S. 
Note that ds 2 = g(Sv, Sv) corresponds to the squared dis- 
tance between two nearby points on S with relative sep- 
aration Sv. The integral of ds 2 over curves on S provides 
the data manifold with a notion of length and distance. 

For the purposes of the symmetry analysis ahead, 
it is useful to consider expansions of g in a basis of 
right-invariant vector fields [31U39j . where, following the 
derivation in Appendix [Aj the components of g at orien- 
tation R are found to be 



ff^(R) 



dru(r) V • [J M qa R (q)]V • [J„qa R (q) 



\q(r) ■ 

(7) 

In the above, J dr denotes integration over the detector 
plane; q(r) is the change in wavevector given by Eq. ([2]); 
V = (d/dq x ,d/dq y ,d/dq z ) is the gradient operator in 
reciprocal space; and are the 3x3 antisymmetric 



matrices in Eq. (A5) generating rotations about the x, 
y, and z axes, respectively. As stated above, czr is the 
structure-factor amplitude corresponding to orientation 
R. 



C. Symmetries 

We now show that the Riemannian formulation de- 
scribed above reveals important symmetries, which can 
be used to determine the object orientation separately 
from the object itself. A fundamental concept in the 
discussion below is the notion of an isometry |31l I44j . 
Broadly speaking, an isometry is a continuous invertible 
transformation that leaves g invariant. More specifically, 
any diffcomorphism <j) : S i— >■ S mapping the latent man- 
ifold to itself induces a transformation </>* acting on ten- 
sors of the manifold; the latter will be an isometry if 
4>*(g) = g holds everywhere on S. A symmetry, there- 
fore, in this context is an operation that leaves distances 
on the data manifold unchanged. 

If the group of isometrics of g acts transitively on S 
(i.e., any two points in S can be connected via a cj> trans- 
formation), the pair (S,g) becomes a Riemannian ho- 
mogeneous space [39] . We refer to any g meeting this 
condition as a homogeneous metric. Riemannian homo- 
geneous spaces possess natural sets of orthonormal basis 
functions, analogous to the Fourier functions on the line 
and the spherical harmonics on the sphere, which can 
be employed for efficient data analysis [35] ■ It is there- 
fore reasonable to design algorithms that explicitly take 
into account the underlying Riemannian symmetries of 
scattering data sets. 

As discussed in more detail below, the isometry group 
of g in Eq. ([5| is generally not large-enough to induce a 
transitive action; i.e., there exist points on the manifold 
that cannot be mapped to one another through an isom- 
etry. Nevertheless, considerable progress in the interpre- 
tation of datascts produced by scattering can be made 
by establishing the existence of a related metric h, whose 
isometry group meets the conditions for transitivity. As 



shown below, the properties of h can be used to interpret 
the results of data analysis performed on (5, g). By com- 
bining aspects of group theory and differential geometry, 
the general techniques developed here constitute a novel 
approach for analyzing scattering data, and potentially 
other machine-learning applications. 

The canonical classes of symmetry operations in prob- 
lems involving orientational degrees of freedom are the 
so-called left and right multiplication maps [3TJ [3HJ EI] • 
These maps, respectively denoted Lq and i?Q, are pa- 
rameterized by an arbitrary rotation matrix Q in SO(3), 
and act on SO (3) elements by multiplication from the left 
or the right, respectively. That is, 



L Q (R) = QR, Rq(R) = RQ. 



(8) 



Viewed as groups, the collection of all left multiplica- 
tion maps or right multiplication maps are isomorphic to 
SO(3), from which it immediately follows that their ac- 
tion is transitive. Thus, any metric g invariant under Lq 
or Rq (or both) makes (S, g) a Riemannian homogeneous 
space. 

The natural metric for the SO (3) latent manifold of 
orientations with these symmetries is the metric tensor 
B associated with the Killing form of the Lie algebra of 
SO(3) [3TJGIH]- This metric is bi-invariant under left and 
right translations. The corresponding group of isometries 
has SO(3) x SO(3) structure, i.e., it is a six-dimensional 
group. In hyperspherical coordinates (x, #,</>), B is rep- 
resented by the diagonal matrix 



[B^] = diag(l, sin 2 x, sin 2 x sin 2 



(9) 



which is identical to the canonical metric on the three- 
sphere S* 3 . For this reason B is frequently referred to as 
the "round" metric on SO (3). 

A key implication of the symmetries of B pertains to 
the cigenfunctions of the corresponding Laplace-Beltrami 
operator Ab @S], defined as 



A B (/) = -|i?r 1/2 J2 d^B^B^dvf 



(10) 



for \B\ = det^], [B» u ] = [B^]- 1 , and a scalar func- 
tion / on SO (3). It is a well-known result in harmonic 
analysis that the eigenfunctions of Ab are the Wigner D- 
functions [35, 46, 47 , complex- valued functions on SO(3), 
which solve the eigenvalue problem 



A B ^w(R)=i(i 



i)^w(R) 



(ii) 



with positive integer j and integers m and m! in the range 
j] ■ Written out in terms of Euler angles in the zyz 
convention, (a 1 , a 2 , a 3 ), explicit formulas for the j = 1 



G 



D-functions are 

Z?Q (a 1 , a 2 , a 3 ) = cos a 2 , 
Di 10 (a\a 2 , a 3 ) = e™ 1 sin(a 2 )/2 1 / 2 , 
Dl ±1 {a\a\ a 3 ) = sm(a 2 )e ±ia3 /2 1 / 2 , 
D 1 _ 11 (a\a 2 ,a 3 )=e- l( - al+a3 '>cos 2 (a 2 /2), (12) 
Dit-ita 1 , a 2 , a 3 ) = e *(° 1 +« 3 ) cos 2 (a 2 /2), 
L>i u (a\ a 2 , a 3 ) = e^ 1 "* 3 ) sin 2 ( a 2 /2), 
D^iCa 1 , a 2 , a 3 ) = e"^ 1 -* 3 ) sin 2 (a 2 /2). 

As one can check by algebraic manipulation, the above 
ninefold-degenerate set of eigenfunctions can be put into 
one-to-one correspondence with the elements of R. That 
is, it is possible to find linear combination coefficients 

Cpqmm' Such that 



E 



(13) 



m,m' — — 1 



where R pq are the elements of R. Thus, if eigenfunctions 
of As could be accessed by processing the observed snap- 
shots or on the data manifold, e.g., through one of the 
graph-theoretic algorithms developed in machine learn- 
ing [5MB], Eq. ( 13 ) could be used to invert the embedding 
map <P. This would be tantamount to having learned to 
"navigate" on the data manifold. 

The method outlined above for symmetry-based inver- 
sion of <P is also applicable under weaker symmetry con- 
ditions on the metric. In particular, it can be extended 
to certain metrics of the form 



h = £ 1 E 1 E 1 + £ 2 E 2 E 2 + £ 3 E 3 E 3 



(14) 



where E^ are the right-invariant dual basis vectors in 
Eq. ( |A8[ ), and £ fl are non-negative parameters. These 
are the so-called spinning-top metrics of classical and 
quantum-mechanical rotors [Ml EE]j arising also in ho- 
mogeneous cosmological models of general relativity [36l 
1371 149) . In classical and quantum mechanics, h features 
in the Hamiltonian of a rotating rigid body with princi- 
pal moments of inertia given by £ fi . In general relativity, 
£ fl characterize the anisotropy of space in the so-called 
mixmaster cosmological model |49j . 

By construction, metrics in this family are invariant 
under arbitrary right multiplications, i.e., they possess an 
SO(3) isometry group. This is sufficient to make (S, h) 
a Riemannian homogeneous space. If two of the £ fl pa- 
rameters are equal (e.g., l\ = £2), h describes the motion 
of an axisymmetric rotor, or the spatial structure of the 
Taub solution in general relativity |37j . As one may ex- 
plicitly verify, in addition to being invariant under left 
multiplications, the metric of the axisymmetric rotor is 
invariant under translations from the left by the one- 
parameter subgroup associated with rotations about the 
axis of symmetry, which corresponds here to the direc- 
tion of the incoming scattering beam. Therefore, it has a 



larger, SO(3) x SO(2) isometry group than the more gen- 
eral metric of an asymmetric rotor. In the special case 
that all of the £^ are equal, h reduces to a multiple of the 
round metric B in Eq. ([9]). 

Let Aft denote the Laplace-Beltrami operator associ- 
ated with h. It is possible to verify that A^, and the 
Laplacian corresponding to the round metric com- 
mute. That is, it is possible to find eigenfunctions that 
satisfy the eigenvalue problem for Ab and A/, simulta- 
neously. In particular, as Hu |36j shows, the eigenvalue- 
eigenfunction pairs (A^j/^,) °f ^-h can be expressed as 
linear combinations of Wigner D- functions with the same 
j and m quantum numbers: 



E 



A? D 3 

mm' mm' 



(15) 



for some linear expansion coefficients A m , . For the most 
general metrics with l\ 7^ £ 2 ^ £3, no closed- form ex- 
pression is available for either A J mm ,, or the correspond- 
ing eigenvalues X 3 m . However, in the special case of ax- 
isymmetric rotors, any A 3 mm , in Eq. ( 15 ) will produce an 
eigenfunction of A^. This means that the vector space 
spanned by the j = 1 eigenfunctions associated with the 
metric of an axisymmetric rotor (denoted here by y J mm i) 
is nine-dimensional. In particular, Eq. ( |13[ ) can be ap- 
plied directly with replaced by y^ m , . 

The eigenvalue X 3 m corresponding to y 3 mm , in the so- 
called prolate configuration, l\ = £ 2 < is given by 



A-' 



1 

2£[ 



J(j + 1) 



1 

24 



1 

2£* 



(16) 



with a similar formula for the oblate configuration 

£2 > h c 



An important point about Eq. ( 16 1 (and 



its oblate analog) is that all of the j = 1 eigenvalues are 
greater than zero and smaller than the maximal j ' = 2 
eigenvalue; i.e., the j = 1 eigenfunctions can be identi- 
fied by ordering the eigenvalues. This property, which 
does not apply for general asymmetric-rotor metrics, al- 
leviates the difficulty of identifying the correct subset of 
eigenfunctions for orientation recovery via Eq. (13). 



D. Accessing the homogeneous metric 

The method outlined above for symmetry-based inver- 
sion of <P makes direct use of the fact that (S, h) is a 
Riemannian homogeneous space. However, (5, h) is gen- 
erally inaccessible to numerical algorithms, which access 
a discrete subset of the latent manifold equipped with the 
induced metric g, i.e., (S,g). It is therefore important to 
establish the isometries of g, because, for instance, the 
latter govern the behavior of the Laplacian eigenfunctions 
computed via graph-theoretic algorithms [3H3 [3 [8] . 

In general, g is not invariant under arbitrary left or 
right multiplications. The lack of invariance of g under 
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the right multiplication map Rq can be seen by consid- 
ering its components in a right-invariant basis. In par- 
ticular, it can be shown that Rq is an isometry of g if 
and only if the components g^ u in the right-invariant ba- 
sis from Eq. ([7| are invariant under replacing R by RQ, 
which does not hold generally. 

What about the behavior of g under left multiplica- 
tions? Here, an expansion of g in an analogous left- 
invariant basis reveals that g is not invariant under left 
multiplications by general SO (3) matrices, but is invari- 
ant under left multiplication by a rotation matrix Q 
about the beam axis z (see Fig. [2|. This is intuitively 
obvious, because the outcome of replacing a snapshot cir 
with oqr is then a rotation on the detector plane, which 
has no influence on the value of the L 2 inner product used 
to evaluate g. In summary, instead of the six-dimensional 
SO(3) x SO(3) isometry group of B, the isometry group 
of the induced metric is the one-dimensional SO (2) group 
of rotations about the beam axis, which obviously cannot 
act transitively on the three-dimensional latent manifold. 

Despite the insufficiency of g to make {S,g) a Rieman- 
nian homogeneous space, g admits a decomposition of 
the form 



= h + w, 



(17) 



with the following key properties: (1) ft, is a spinning-top 
metric from Eq. (14 1 with i\ = £2 (i.e., an axisymmetric- 
rotor metric, not an arbitrary right-invariant metric). (2) 
w is a symmetric (but not necessarily positive-definite) 
tensor, whose components average to zero over the man- 
ifold, describing the inhomogeneous part of g (see Ap- 
pendix A 2 for a derivation). Moreover, provided that w 
meets a suitable norm constraint, g and h are bi-Lipschitz 
equivalent. This means that the distortion in distances 
on S caused by using g instead of h, and the correspond- 
ing change in the Laplace-Beltrami eigenfunctions, are 
both bounded [50]. The fact that h in Eq. ([17) is the 



metric of an axisymmetric rotor is a direct consequence of 
projection onto a circularly-symmetric 2D detector, and 
therefore an essential aspect of scattering experiments. 

The important point is this: scattering manifolds are 
associated with induced metrics g, which, in themselves, 
possess low symmetry. But they can be decomposed 
into a homogeneous part h with a high SO(2) x SO(3) 
symmetry and thus associated with well-known Lapla- 
cian eigenfunctions independently of the object, plus a 
low-symmetry residual, which depends on the object. In 
essence, one can regard the homogeneous metric h as an 
object-independent "Platonic ideal" [5] version of g, and 
develop algorithms, which make use of the properties of 
h to analyze all scattering datasets. 

The symmetries of h can be exploited in several ways: 
(1) enforce appropriate constraints in Bayesian algo- 
rithms [12]; (2) interpret the results of numerical Ricci 
flow [5TJ [52]; or (3) approximate the eigenfunctions as- 
sociated with the homogeneous metric by a truncated 
set of eigenfunctions associated with the inhomogeneous 
metric. The last possibility is described below. 



Represent each snapshot or ; in the dataset as a point 
in nine-dimensional Euclidean space 1R 9 with coordinates 
given by {ipn, ■ ■ . , V^;))- Here, ipik are eigenvector compo- 
nents of an elliptic operator on a graph constructed from 
the s observed snapshots (ur 1} ■ ■ ■ ,or,) in n-dimensional 
data space (see AppendixjB]). This induces an embedding 
<P of the latent manifold S in M 9 given by 



#(Ri) = (ipii,...,ipi 9 )- 



(18) 



Provided that the number of observed snapshots is suffi- 
ciently large, suitable algorithms (such as Diffusion Map 
of Coifman and Lafon [51 [S3] ) , lead to embedding coor- 
dinates tpik, which converge to the corresponding values 
of the Laplacian eigenfunctions associated with g, even if 
the sampling of the data manifold is non-uniform. More 
specifically, for large-enough s we have the correspon- 
dence 



(19) 



where V'fc(Ri) is the k-th eigenfunction of the Laplace- 
Beltrami operator A g associated with the induced metric 
in Eq. i.e., 



Agip k = AfeVfc (20) 



with 



A s (-) = ~\g\- 1/2 J2 9 » [\9\ 1,2 9^du{-)\ ■ (21) 
Now we explicitly employ the symmetries of the homo- 



geneous metric h in Eq. (14), motivated by the decompo- 
sition of g in Eq. ( fl7[ ) : Express each eigenfunction of the 
homogeneous Laplacian with j = 1 as a linear com- 
bination of the eigenfunctions iftk of the inhomogeneous 
Laplacian, viz., 



Dl 



E 



(22) 



where we have made use of the correspondence in Eq. ( 15 ) 



between Wigner /^-functions and the eigenfunctions asso- 
ciated with the axisymmetric rotor. These eigenfunctions 
are related to the matrix elements of the rotation opera- 
tors Ri, in accordance with Eq. (13). Retaining only the 
first nine ipk, determine the elements of the rotation ma- 



trix by a least-squares fit [Eqs. (B5)] of the coordinates 
computed in Eq. (19) via graph-theoretic analysis. 



The symmetries of h play an essential role in this pro- 
cedure, since they yield: (1) the number and sequence of 
graph-Laplacian eigenfunctions used to represent the la- 
tent manifold in Eq. (18); and (2) the procedure to infer 
the orientations R^ from the eigenfunctions [in particular, 
via the structure of the error functional in Eq. (B5a)]. As 
demonstrated in Sec.[Vj this leads to accurate orientation 
determination for each snapshot. 



E. Extension to general scattering problems 

Because the above analysis was performed under a re- 
strictive set of experimental conditions, we now consider 
the effect of removing these. The assumption of kine- 
matic (single) scattering introduces an additional sym- 
metry due to Friedel's law. As we have not used this 
symmetry, our arguments remain valid under multiple 
scattering conditions. The use of linearly or elUptically 
polarized radiation introduces a second preferred direc- 
tion in addition to that of the beam, thus removing the 
SO(2) isometry under left translations, but this can be 
restored by appropriate correction with the polarization 
factor. A detector not at right angles to the beam axis 
also reduces the isometry to I x SO (3), but this can also 
be easily corrected by an appropriate geometric factor. 
Absorption can be accommodated as a complex density 
function, inelastic scattering by allowing q\ ^ q2, etc. 
Neither affects our conclusions. Our approach is thus 
applicable to a wide range of image formation modali- 
ties. 

Note, however, that the method has to be modified 
when the object has discrete or continuous symmetries. 
This is because the data manifold in this case is not 
SO(3), but the quotient space SO(3)/T where r is a 
subgroup of SO(3) representing the object's symmetries. 
Among the eigenfunctions of the Laplacian on the SO(3) 
manifold, only those that are constant on _T "survive" in 
the SO(3)/f environment. Thus, the orientation recov- 
ery procedure must be modified depending on the form 
of the available eigenfunctions. As mentioned in Sec. [TTJ 
this issue will be addressed elsewhere. 



V. DEMONSTRATION OF STRUCTURE 
RECOVERY IN 3D 

It has long been known that the use of problem-specific 
constraints can substantially increase computational ef- 
ficiency [54) . By combining wide applicability with class 
specificity, symmetries represent a particularly powerful 
example of such constraints. Exploiting these, we here 
demonstrate successful orientation recovery for a system 
computationally 10 4 x more complex than previously at- 
tempted pll HO] ■ In Paper II we apply our framework to 
snapshots of various kinds with extremely low signal. 



A. 3D reconstruction from diffraction snapshots 
with high computational complexity 

We simulated X-ray snapshots of the closed confor- 
mation of E. coli adenylate kinase (ADK; PDB descrip- 
tor 1ANK) in different orientations to a spatial resolu- 
tion d = 2.45 A, using 1 A photons. In this calcula- 
tion, Cromer-Mann atomic scattering factors were used 
for the 1656 non-hydrogen atoms [55], neglecting the hy- 
drogen atoms. We discretized the diffraction patterns 



on a uniform grid of n = 126 x 126 = 15, 876 detector 
pixels of appropriate (Shannon) size, taking the corre- 
sponding orientations uniformly on SO(3) according to 
the algorithm in Ref. |56j . The number of diffraction 
patterns required to sample SO(3) adequately is given 
by 87r 2 (D/rf) 3 ~ 8.5 x 10 5 , where D = 54 A is the diam- 
eter of the molecule [19] . In our simulation, however, a 
larger D = 72 A diameter was assumed, allowing, e.g., 
for the possibility to reconstruct the structure of ADK's 
open conformation. Hence, a total of s = 2 x 10 6 patterns 
were used in the present analysis. 

The diffraction patterns were provided to our Diffusion 
Map-based algorithm with no orientational information, 
and the orientation of each diffraction pattern was deter- 
To 



mined by means of the algorithm in Table III 



[57]. 



estimate the difference between the deduced and true ori- 
entations, respectively represented by unit quaternions fj 
and Tj [see Eq. (B9|], we used the RMS internal angular 
distance error, 



£ = 



1 

s(s-l) 



1/2 



(23) 



In the above, = 2arccos(|Ti • Tj\) and — 

2arccos(|fi • fj\) are respectively the true and estimated 
internal distances between orientations and Rj, and • 
is the inner product between quaternions. The resulting 
internal angular distance error in the present calculation 
was 0.8 Shannon angles. 

Next, we placed the diffracted intensities onto a uni- 
form 3D Cartesian grid by "cone-gridding" [55], and de- 
duced the 3D electron density by iterative phasing with 
a variant of the charge flipping technique |16j developed 
by Marchesini [59] ■ Charge flipping was prevented out- 
side a spherical support about twice the diameter of the 
molecule. The i?-factor between the gridded scattering 
amplitudes and the ones obtained from phasing was 0.19. 
The close agreement with the correct structural model is 
shown in Fig. [3] and the EPAPS movie. 



B. Computational cost 

In the absence of orientational information, the compu- 
tational cost C of orientation recovery without restrictive 
sparsity assumptions scales as a power law 



C cx R 8 = (D/df 



(24) 



with D the object diameter and d the spatial resolution 
[H?ll21j . The analysis of ADK was performed with R — 
30, compared with the largest previously published value 
of R < 8 [T^H2T] . This represents an increase of four 
orders of magnitude in computational complexity over 
the state of the art, as shown below. 

The computational cost of a single expectation- 
maximization (EM) step in Bayesian algorithms (e.g., 
the GTM algorithm [UJ [SOI) scales as Ksn, where K, s, 
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TABLE I. Computational complexity of GroEL and ADK 



FIG. 3. Three-dimensional electron density of the closed con- 
formation of E. coli adenylate kinase, recovered from 2 x 10 6 
diffraction patterns of unknown orientation at 2.45 A resolu- 
tion. Hydrogen atoms were neglected in the electron density 
calculation. The ball-and-stick model represents the actual 
structure. See also the EPAPS movie. 



and n are the number of quaternion nodes, the number 
of snapshots, and the data-space dimension, respectively 
1 1 9 j . Here, the data space dimension is equal to the num- 
ber of Shannon pixels. Assuming an oversampling of 2 
in each linear dimension, we have 



K 



8tt2 (D 
~S 



fK 



Stt 2 / (D 
S 



(25) 



AD 

T 



tan 



2 arcsin 



2d 



1G 



D 



where S is the object symmetry, / the number of snap- 
shots per orientational bin, and A the wavelength. As- 
suming that the number of EM iterations is constant, 
Fung et al. [T9"] estimate an eighth-power scaling of the 
form in Eq. (24) for the total computational cost of ori- 



entation recovery with GTM. The fact that GTM is NP- 
hard remains moot. 

Loh and Elser 20J argue that their EM-based approach 
scales as 



C oc M rot s N phot 

o 



(26) 



where M rot is the number of rotation samples (orienta- 
tional bins) , and that this leads to an R 6 scaling. This is 
based on three assumptions: (1) A sparse formulation can 
be used to replace the number of detector pixels n with 
a much smaller number of scattered photons -/Vphotons- 



^photons 



Af rot 

M rot S ^photons 

Ratio to GroEL 



GroEL 



10 2 

10 6 
2.5 x 10 4 
2.5 x 10 12 
1 



ADK 



Sparse 
representation 
~ 10 3 
2 X 10 6 
2 X 10 6 
4 X 10 15 
1.6 X 10 3 



No sparse 
cprcscntation 
1.6 X 10 4 

2 X 10 6 

2 X 10 6 
6.4 x 10 16 
2.6 X 10 4 



(2) iVphotons does not depend on R, and thus can be ig- 
nored in the scaling behavior. (3) The number of EM 
iterations is independent of R (as in Ref. [19]), and of 
-^photons- Note that M rot and the number of GTM bins, 
K, are equal. 

Assumption (1) holds only for very small photon 
counts and in the absence of significant background scat- 
tering. Assumption (2) is not justified [ST]. For a globu- 
lar protein, the total number of photons scattered to large 
angle scales linearly with the number of non-H atoms, 
and hence object volume, i.e., as D 3 . Thus, 



M, 



J- rot 



8tt2 /D 
S \d 

87T 2 / ( D 



S 



^photons ~ n = 16 



D 



(27) 



leading to the same R 8 scaling behavior as Fung et al. 
The validity of assumption (3) remains moot. 

Equipped with the estimates from Eqs. (24)-(27), we 



now compare the computational complexity of our ADK 
calculation with the reconstructions of the chignolin and 
GroEL molecules by Fung et al. [TS] and Loh and Elser 
[20] . respectively. Using Eq. | |24[ ), the increase in compu- 
tation complexity on going from chignolin (-D c hig = 16 A, 
4hi g = 1-8 A), to ADK (L>adk = 54 A, D ADK ( suppor t) = 
72 A, c?adk = 2.45 A) is 



-CADK (support) /^ADK 
-^chig/^chig 



29.4 
~8J~ 



1.4 x 10 4 



^ (28) 
Using the scaling expression of Eq. ( [26] ) , the increase in 

comple xity on going fro m GroEL to ADK (72 A support) 

for the Loh and Elser| algorithm varies between O(10 3 ) 

and O(10 4 ) for sparse and non-sparse representations, 

respectively (see Table pL As a sparse representation is 

not generally possible and we have not had to resort to 

it, the appropriate comparison is 10 4 . 



VI. CONCLUSIONS 

We have shown that a Riemannian formulation of 
scattering reveals underlying, object-independent sym- 
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metries, which stem from the nature of operations in 3D 
space and projection onto a 2D detector. These sym- 
metries lead to immediate identification of the natural 
eigenfunctions of manifolds produced by data from a 
wide range of scattering scenarios, and thus to physically- 
based interpretation of the outcome of graph-theoretic 
analysis of such data. In practical terms, the ability to 
access the homogeneous metric offers a computationally 
efficient route to determining object orientation without 
object recognition, while the object-dependent term pro- 
vides a concise fingerprint of the object for recognition 
purposes. There are tantalizing indications that face per- 
ception in higher primates may occur in this way pQ . The 
ability to use symmetries to navigate on the data mani- 
fold is tantamount to efficient machine learning in 3D, in 
the sense that given any 2D projection, any other can be 
reconstructed. 

As shown in Paper II, the manifold itself offers a pow- 
erful route to image reconstruction at extremely low 
signal, because snapshots reconstructed from the man- 
ifold achieve higher signal-to-noise ratios than can be 
obtained by traditional methods relying on classification 
and averaging. Combined with the ability to sort random 
snapshots of an evolving system into a time-series, also 
demonstrated in Paper II, our approach offers a radically 
new way for studying dynamic systems in 4D. Fundamen- 
tally, the homogeneous metric describes the transforma- 
tions of objects without reference to any specific object. 
This is reminiscent of a Platonic Form, from which spe- 
cific objects emanate [2]. It is therefore tempting to re- 
gard the homogeneous manifold as a Platonic Form, from 
which our perception of three-dimensional objects stems 
[62]. 
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Appendix A: The induced metric tensor of 
scattering data sets 

Here, we discuss the properties of the induced metric 
tensor g in Eq. ([5| . We begin in Appendix A 1 with a 
derivation of the explicit form for the components of g in 
the right-invariant basis from Eq. Q. In Appendix 



we perform the decomposition of g in Eq. (17) into ax- 
isymmetric and inhomogeneous parts 



A2 



In Appendix |A 3| 
we provide estimates of the error incurred in using the 
Laplacian eigenfunctions associated with the inhomoge- 
neous metric g for orientation recovery. 



Derivation 



Following Sec. |lVBl we describe scattering as an em- 
bedding <P taking the latent manifold S to the set of 
square- integrable intensity patterns L 2 (R 2 ) on a 2D di- 
mensional detector. Broadly speaking, the induced met- 
ric describes an inner product between tangent vectors 
on the latent manifold S associated with that embed- 
ding. More specifically, in accordance with Eq. (f5l), that 
inner product is computed by "pushing forward" tangent 
vectors on S to manifest space, and applying the canon- 
ical Hilbert space inner product in Eq. |3). The map 
carrying along the tangent vectors of S is the derivative 
map <P* associated with <P, which is evaluated as follows. 

First, note that every smooth tangent vector field v 
on S generates a corresponding one-parameter family of 
transformations 1 31 , 32. 1551. 



(Al) 



with a a scalar parameter and R Q an element of S, which 
depends smoothly on a. In the above, (f> a describes a 
curve on the manifold (called an integral curve of v), 
which is tangent to v at every point. 

Likewise, the pushforward <P*(v) oiv is associated with 
a continuous transformation <j) a of intensity snapshots in 
manifest space. Denoting the snapshot associated with 
orientation R by or = <£(R), that transformation is given 
by 



or 



(A2) 



with a,ft a determined from Eq. (All. The outcome of 



acting on v with <P* is then the directional derivative of 
intensity snapshots along the path defined by <p a , viz. 



<P*(v) = lim — ^ — . 



(A3) 



The induced metric g resulting from the above proce- 
dure will depend on the explicit form of the embedding 
map <£. Here, we consider the case where <P describes 
far-field kinematic elastic scattering from a single object, 
where intensity amplitudes are given by the Fourier inte- 
gral in Eq. . Other scattering scenarios can be treated 



11 



in a similar manner, provided that <P meets the condi- 
tions of an embedding. For our purposes, it is sufficient 
to consider one-parameter transformations arising from 
left multiplication by SO (3) matrices carrying out rota- 
tions about one of the x, y, or z axes [see Eq. Q]. That 
is, we set 



L£(R) = exp(cd /1 )R for /xG {1,2,3}, 



(A4) 



where a is the rotation angle in radians, and Ji, J2, and 
J3 are 3x3 antisymmetric matrices generating rotations 
about the x, y, and z axes: 



Ji = 




(A5) 



We denote the ve ctor fields generating <j)^ by e M . It then 
follows from Eq. ( |A3| ) with R a = L£(R) that the push- 
forward fields <!>* (eT) are given by 



**(e M )(0 = N(f)] 1/2 V • [J M qa R ( 9 )] |, = 



(A6) 



Note that in deriving Eq 
divergence-free property V • (J M q) 



(|A6|) we have used the 
= 0, which applies 



for any scattering wavevector q and rotation generator 

As one may verify, the tangent vector fields are lin- 
early independent. This means that the set {ei, 62,63} 
forms a basis to expand tangent vectors on S, and, in 
turn, the induced metric can be represented by a 3 x 3 
matrix with elements 



9w(R) = (**e„,«P*e„). 



(A7) 



The g^y (R) above are the components of the induced met- 
ric in Eq. Q at orientation R, provided that are dual 
basis vectors to e„, defined through the relation 



5* 



(A8) 



Moreover, it is possible to show that e M are invariant 
under the right multiplication map Rq in Eq . (J8 



Substituting for $*(e^) in Eq. (A7) using Eq. (A6l then 
leads to the expression in Eq. (|7j) for the components of 
the induced metric in a right-invariant tensorial basis. 

Besides the form in Eq. Q, it is useful to express the 
components of g in terms of spherical harmonics in recip- 
rocal space. As is customary in diffraction theory [63) . we 
write down the reference scattering amplitude a = #(l) 
corresponding to the identity matrix I using the expan- 
sion 



«(<?) = E E <ww)< 



(A9) 



where a™ are complex functions of the radial coordinate 
q in reciprocal space, and YI n are spherical harmonics. 
Note the property a" 1 * = (—l) m aj m , which is a conse- 
quence of a(q) being real. Making use of the standard 
formula describing rotations of spherical harmonics via 
Wigner /^-matrices, 



Y ] m (R-\e,cj ) ))= 0im'(R)?T'(M), (A10) 

m'=-j 

it follows that the amplitude distribution <2r = ^(R) cor- 
responding to object orientation R is given by 

00 j 

«R(9)=E E a T'(l) DJ rn„Am] n (0^)- (All) 

Inserting the above in Eq. ^ leads to the following gen- 
eral expression for the metric components, 

OO jl J2 

s/-( R )= E E E (-i) roiWi 

ji ,32=0 mi ,mj =-ji m 2 ,m' 2 =—j 2 

xD 11 ,(R)D h ,(R)Kir im{hm2m '\ (A12) 

In the above, K^ v 1 132 2Tri2 are complex-valued co- 
efficients, which vanish unless the following conditions 
are met (together with the corresponding conditions ob- 
tained by interchanging /x and v): 



?7i2 = mi ± 2 or ni2 = mi for (zx, v) 



(1.1) , 

(2.2) , 



m.2 = mi ± 2 for (/x, 1/) = (1, 2) , 

, 1f ( s J (1,3), 
12 = mi ± 1 for (/x, i/) = < ^ 

m2 = mi for (zx, i/) = (3, 3). 



(A13) 



2. Decomposition into homogeneous and 
inhomogeneous parts 

Even though the induced metric tensor g in Eq. ^ is 
not homogeneous, it is nevertheless possible to decom- 
pose it as a sum 



g = h + w, 



(A14) 



3=0 m=—j 



where h is a homogeneous metric with respect to a group 
acting transitively on the data manifold, and w a tensor 
that averages to zero over the manifold. Using Fourier 
theory on SO (3) [35l |64], here we compute the compo- 
nents of h and w in the right-invariant basis from 
Eq. ( |A8| ) . The analysis presented below yields the impor- 
tant results that (i) the homogeneous part h of the metric 
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belongs to the family of metrics associated with axisym- 
metric rotors; (ii) the Fourier spectrum of w has a highly 
sparse structure, with only a limited number of nonzero 
coefficients contributing to metric inhomogeneity. If, as a 
result of the structure of its Fourier spectrum, the norm 
||w|| of w (suitably defined) is smaller than the norm \\h\\ 
of h, then using the numerically-accessible eigenfunctions 
of g to approximate the eigenfunctions of h results in a 
bounded loss of accuracy. 

We begin by noting that the Fourier transform of the 
metric components g M „ in Eq. (A12| consists for each 
(/*,*/) of the sequence {fip^Tpj + 1) x (2j + 1) 
matrices g^, whose components are given by 



v\mm' 



= / dV^g^WDi^R- 1 ). (A15) 



In the above, integration is performed over SO(3) using 
the volume element dV of the bi-invariant metric B in 
Eq. (9b [6^1, and D J mm , are the Wigner D-functions from 
Eq. ( 11 ). The collection of the gi matrices may be used 



to recover g'(R) via the inverse transform 



1 



OO 



fl^(R) = 8^ 2J 2 ^ + l)tr(^^(R)), (A16) 



where D J '(R) = [D^ m ,(R)] is the j-th Wigner matrix of 
size (2j + 1) x (2j + 1). Because D° is the trivial rep- 
resentation of SO(3), a constant scalar on the manifold, 
the term g^ v corresponds to the homogeneous part of 
the metric. The matrices with j > 1 give rise to the 
inhomogeneous tensor w, which encodes object-specific 
information. That is, we have 



•■Jjiv 



8tt 2 ^ 

3 = 1 



(2j + l)tr(.g^"(R)). 



where and w 



(A17) 

are the components of h and w in the 



-E M basis, respectively. 

Since each component of the metric contains a product 



of two Wigner D-functions [see Eq. (A12)], the calcula- 



tion of the Fourier transform is facilitated significantly by 
the triple-integral formula involving the 3-jm symbols of 
quantum mechanical angular momentum [351 147] , 



|#(R)C,(R)< m ,(R)^ m ,(R) 



(-1) 



m+m 



8n 2 



Jl J2 J 
m l m 2 ~ m 



Jl J2 



TO, mb 



3 



(A18) 

In particular, inserting the metric components from 



Eq. ( A12 1 in Eq. (A15|, and making use of the triple- 



integral formula, leads to the result 



i/\mm' 



3 1+3 

E 



32 

E 



Ji=0j 2 =|ji— j| mi,mi=-ji m 2 ,m' 2 = -j 2 

. . ( i \mi+Jn'i -\-m-\-m' 

jim 1 m 1 j2m 2 m 2 \ r ) 

16tt 2 



Jl J2 

—mi ni2 m . 



Ji J2 J 



-HI TOn TO 



(A19) 



where we have employed the triangle inequality, \ji — 
j\\ < J ' < ji +J2i obeyed by t he nonzero 3-jto symbols. 
The key point about Eq. (A19) is that the selection rules 



for the 3-jto symbols strongly restrict the values of the 
azimuthal quantum number to for which [<?^„]mm' is non- 
vanishing. Specifically, it is possible to show that the 
matrix elements [g^jmm' are zero unless 

to G {0, ±2}, for (fx, v) = (1, 1) or 0, v) = (2, 2), 
tog {±2}, for ( A t,i/) = (1,2) J 
tog {±1}, for (p,v) = (1,3) or (/*,»/) = (2,3), 
me{0}, for (fi,v) = (3,3). 

(A20) 

It follows that the j = term of the Fourier spectrum 
(defined only for to = to' = 0) giving rise to the homoge- 
neous part of the metric has no off-diagonal components. 
This in turn means that h can be expressed in terms of 
some non-negative parameters in the form of Eq. ( 14 1. 



An explicit evaluation of the on-diagonal terms yields the 
additional result that the i\ and £2 parameters are, in 
fact, equal. Thus, the components of the induced metric 
read 



= h 



{lis 



[V] = 



Ui 








(A21) 



By construction, the average of u> M „ over the manifold 
vanishes, i.e., 

J dV{R)w^ v (R) = 0. (A22) 



3. Bounding the error in the eigenfunctions 

Given the above decomposition of the induced met- 
ric g into a homogeneous metric h associated with an 
axisymmetric rotor plus an inhomogeneous component 
u>, it is possible to write down sufficient conditions for 
the discrepancy between the Laplace-Beltrami eigenfunc- 
tions associated with g and h to be bounded. First, note 
that h induces a norm || - 1| ft,,oo on tensors of type (0,2) 
on the latent manifold. For a sufficiently smooth tensor 
field u of type (0, 2) that norm is defined as 



\ u \\h. 



sup 



\u(v,v)\ \ 1/2 
h(v, v) 



(A23) 
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where the supremum is taken over SO(3) matrices R and 
nonzero tangent vectors v on SO(3). Thus, ||it||/i )0 o in- 
volves computing the least upper bound over the mani- 
fold and over the nonzero tangent vectors of the relative 
magnitudes of the quadratic forms u(v,v) and h(v,v). 
Next, set u = g = h + w, and compute 



\g\\h. 



sup 



w(v, v) 
h(v, v) 



1/2 



(A24) 



By the direct and reverse triangle inequalities of norms, 
we have the bounds 



II 



10/1,1 



< IMkoo < i + \\w\\ h ,, 



(A25) 



If HHU.oo is strictly less than one, then Eq. (A25) im- 
plies that ||g|U,oo is bounded from below by the nonzero 
constant A- = 1 — ||iy||feoo and from above by A + = 

1 + ||l«|U,oo< 

It follows that if the condition H^H^oo < 1 holds, then 
g and h obey the bound 



j < \\g\\ h ,oo < A 



(A26) 



with A = maxjyll 1 , A + }. This property-is called bi- 
Lipschitz equivalence. Berard et al. [50] show that if the 
Ricci curvatures of two bi-Lipschitz equivalent metrics, 
g and h, are bounded from below by some negative con- 
stant, then there exist constants "q^A), which go to zero 
as A approaches 1, such that for any orthonormal ba- 
sis of eigenfunctions {yi}fLi 01 ^tn one can associate an 
orthonormal basis of eigenfunctions {4>i}iLi 01 ^-g satis- 
fying the bound 



Hi - ytWoo < m{A), 



(A27) 



where ||-||oo denotes the supremum norm. 

This means that if one approximates the set of (pos- 
sibly degenerate) eigenfunctions of with correspond- 
ing eigenvalues \±, . . . , A^ by the eigenfunctions of A g 
with corresponding eigenvalues Ai, . . . , \k (sorted in in- 
creasing order) , then the maximum error is bounded from 
above in a pointwise sense. Here, we do not attempt to 
derive conditions on the properties of the object needed 
to guarantee that H^H^oo < 1 holds, nor do we address 
whether the bi-Lipschitz equivalence between g and h 
can be deduced by imposing weaker constraints on w. 
Nevertheless, to the extent that the norm of w is indeed 
small (as suggested by, e.g., the strong selection rules on 
its Fourier spectrum), we expect that using the leading 
eigenfunctions of g for orientation recovery should pro- 
duce only small systematic error, as observed in practice. 



Appendix B: Algorithms 



As an instructive application of the symmetries iden- 



algorithm for the analysis of scattering datasets. This 
makes explicit use of the properties of the homogeneous 
metric h in Eq. pT| to interpret results produced by the 
Diffusion Map algorithm of Coifman and Lafon [5]. As 
mentioned earlier, and also demonstrated in Paper II, a 
variety of other manifold algorithms can also be used, all 
taking advantage of the symmetries underlying datasets 
produced by scattering. Here, we consider the idealized 
scenario of noise-free data. This will be relaxed in Pa- 
per II, where we extend our approach to deal with snap- 
shots severely affected by Poisson and Gaussian noise. 

Instead of a continuous data manifold A4, experimen- 
tal data represent a countable subset M of M. consist- 
ing of s identically and independently distributed (IID) 
samples in n-dimensional data space (with n the num- 
ber of detector pixels) drawn randomly from a possibly 
non- uniform distribution on M.. That is, we have 



M = {c 



(Bl) 



where g ii = (a,-i, . . . , dj n ) are n-dimensional vectors of 
pixel amplitudes (see Fig. [I]). As described in Sec. IV B 



the amplitudes are given by a.y = <2r. (rj), where fj is the 
position of pixel j in the detector plane, and or ; = <P(Ri) 
is the snapshot associated with orientation R^. 

In Ref. [5] it is shown that it is possible to construct 
a one-parameter family of diffusion processes (random 
walks) on the point cloud from Eq. (Bl |, with each pro- 
cess described by an s x s transition probability matrix 
P c , such that in the limit e^O and s —> oo the eigenvec- 
tors of P e converge to the eigenfunctions of the Laplace- 
Beltrami operator A g associated with the distance metric 
in data space [i.e., the induced metric tensor in Eq. jHJ]. 
More specifically, if ip are s-dimensional column vectors 



in the eigenvalue problem 



(B2) 



then the relation ipn- ipk(Ri) holds for large-enough s 
and small-enough e, where V'fe(Ri) is the A:-th eigenf unc- 



tion of A g in Eq. (21 1 evaluated at element R^ on the 



latent manifold. 

Central to the efficiency of Diffusion Map is the fact 
that the transition probability matrix P e is highly sparse. 
This is because P e is constructed by suitable normaliza- 
tions of a matrix W assigning weights Wij = A^a^a^) 
via a Gaussian kernel 



^fe,a-) =cxp(-||a 4 -a-|| 2 / £ ) 



(B3) 



to pairs of snapshots in M, depending on their Euclidean 
distance in data space. In applications, one typically fixes 
a positive integer d and for each snapshot a.j retains the 
distances up to its d-th nearest neighbor. Hereafter we 
denote the index of the j-th nearest neighbor of a Li by Nij , 



tified in Sec. IV we describe an accurate and efficient 



and the corresponding distance by Sij — 
Pseudocode for computing P e given the s x d distance 
and index matrices, S 
Table El 



[Sij] and N = [JVy], is listed in 
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TABLE II. Calculation of the sparse transition probability 
matrix P e in Diffusion Map, following Coifman and Lafon [8]. 

Inputs: 

s X d distance matrix S 

s X d nearest-neighbor index matrix N 

Gaussian width e 

Normalization parameter a 

Outputs: 

s X s sparse transition probability matrix P 
1: Construct an s X s sparse symmetric weight matrix W, such that 



1, if i = j, 

lexp(-S? fe /e), ifi = iV ifc , 

W 3Z , if W lj / 0, 

1 0, otherwise. 



2: Evaluate the s X s diagonal matrix Q with nonzero elements Qa — 

3: Form the anisotropic kernel matrix K — Q a WQ a . 

4: Evaluate the s X s diagonal matrix D with nonzero elements Da — 

5: return P e = D X K 



Once the eigenvectors in Eq. (B2) have been evalu- 



ated, the next step in the orientation-recovery process 
is to evaluate the linear-combination coefficients needed 
to convert the nine-dimensional embedding coordinates 
(tplll, ■ . ■ , ipw) in Eq. ( 18 1 to elements of an approximate 



rotation matrix R;. This conversion is performed by the 
discrete analog of Eq. (13 1, namely 



[Rj]« = X! C ijk1plkl, 1<1<S. 



(B4) 



fc=i 



Here the expansion coefficients c^k arc determined by a 
least-squares minimization of the error functional 



0({cy fc }) = 53G?({c0 fc }), 



1=1 



Gf({c t3k }) = \\RfR l -\\\ 2 + \det(R l )-l\ 

9 

[R/]y = ^ Cijklplk, 



(B5a) 
(B5b) 
(B5c) 



fc=i 



where II Ml 



1/2 



Si j=i ) denotes the Frobenius 



norm (the entrywise two-norm) of a 3 x 3 matrix M = 

[iky. 

Because the eigenfunctions tpk are not exact linear 
combinations of the Wigner D-functions in Eq. (12), a 



(real) polar decomposition step may be applied to pro- 
duce an exact rotation matrix R, given an approximate 
rotation matrix estimated from the eigenfunction data 
via Eq. (B4). The real polar decomposition is a factor- 



ization of a real square matrix R into the product 

R = RS, (B6) 

where R is an orthogonal matrix, and S is symmetric 
positive-semidefinite matrix. 



The nonlinear least-squares step may be adequately 
performed using only a small subset of the full dataset, 
consisting of r datapoints drawn randomly from M. For 
instance, in the results of Sec. |Vj where the total number 
of samples is s = 2 x 10 6 , we use r — 8 x 10 4 data- 
points. We denote the total squared residual of the op- 
timization process by Q* , where Q* is equal to G{{cijk}) 
with {cijk} given by the minimizer of the 



in Eq. (B5a 



error functional. 

Besides R, one might additionally require an evaluation 
of the corresponding coordinates in an SO(3) coordinate 
chart (e.g., for diffraction-pattern gridding). Depending 
on the coordinate chart of interest, various techniques are 
available to perform this procedure stably and efficiently. 
For instance, the coordinates of R in the unit-quaternion 
parameterization |40j can be determined by making use 
of the trace formula 



tr(R) = 1 + cosx 
and the eigenvector relation 

Rn = n, 



(B7) 



(B8) 



where \ is the angle of the rotation represented by R, and 
n = n x x + n y y + n z z is a unit vector in K 3 directed along 
the corresponding axis of rotation. The unit quaternion 
r corresponding to R is given by 

t = (cos(x/2),sm(x/2)ri a ,sin(x/2)n 2/ ,sin(x/2)n z ). 

(B9) 

In outline, our orientation-recovery method consists of 
the following three basic steps: 



1. Use Diffusion Map to compute the first nine non 
trivial ei 
erator P 



trivial eigenfunctions ip^ . . . , ip & of the diffusion op- 



2. Determine the linear combination coefficients ajk 
to transform the ij) eigenvectors to (approximate) 
rotation matrices by solving the nonlinear least 
squares problem in Eq. (B5|; 



3. Use the correspondence in Eq. (B4) and polar de- 



composition [Eq. (B6|] to assign a rotation matrix 
R; to each observed diffraction pattern, and (op- 
tionally) extract the parameters of R; in the SO(3) 
parameterization of interest. 

A high-level description of the orientation-recovery 
procedure for noise-free data is given in Table |III| 

Before closing this section, we comment briefly on 
methods for choosing the kernel width e and the num- 
ber of retained nearest neighbors d. This is a common 
issue in manifold-embedding algorithms, and a number 
of strategies for determining these parameters have been 
developed in the literature [TU1|30]. Here, we determine e 
and d a posteriori, by monitoring the value of the least- 
squares residual Q* . Specifically, in the applications pre- 
sented in Sec.|Vj we start with a number of nearest neigh- 
bors d ~ 100 and a value of e of order the mean distance 
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TABLE III. Orientation-recovery for noise-free snapshots us- 
ing Diffusion Map 

Inputs: 

Noisc-frcc snapshots M — {a ± , ■ ■ ■ , a } 
Number of retained nearest neighbors d 
Number of datapoints in the least-squares fit, r 
Gaussian width e 

Outputs: 

Estimated quaternions T — {ri , . . . , r s } 
Nearest-neighbor index matrix N 
Least-squares residual G* 

1: Compute the s X d matrices N and S such that 

Nij — index of j-th nearest neighbor to snapshot , 
S H = IK - Sitfy II- 

2: return N 

3: Compute the sparse transition probability matrix P using the algo- 
rithm in Table [TT] with inputs S, N, e, and a — 1. 

4: Solve the sparse eigenvalue problem P^ fc — ^k'4 ) k for < k < 9 
and 1 — Ao < Ai < ■ ■ ■ < Ag. 

5: Solve the nonlinear least-squares problem jB5[ . 

6: return Q* 

7: for i — 1, . . . , s do 

8: Comp ute an approximate SO(3) matrix R^ for snapshot a t via 
Eq. p4| . 

9: Project Ri to an orthogonal matrix R^ using Eq. ( |B6t 
10: Convert R^ to a unit quaternion via Eqs. |B7jHm9l. 
11: return Ti 
12: end for 



to the (<i/2)-th nearest neighbors of the datapoints, and 
then perform successive refinements of these parameters 
seeking to minimize Q* . 
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